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University of Chicago 

We consider parameter estimation, hypothesis testing and vari- 
able selection for partially time- varying coefficient models. Our asymp- 
totic theory has the useful feature that it can allow dependent, non- 
stationary error and covariate processes. With a two-stage method, 
the parametric component can be estimated with a n 1//2 -convergence 
rate. A simulation-assisted hypothesis testing procedure is proposed 
for testing significance and parameter constancy. We further propose 
an information criterion that can consistently select the true set of 
significant predictors. Our method is applied to autoregressive mod- 
els with time-varying coefficients. Simulation results and a real data 
application are provided. 

1. Introduction. Varying coefficient models have been extensively stud- 
ied in the literature, and they are useful for characterizing nonconstancy 
relationship between predictors and responses in regression models; see, for 
example, [19, 20, 28, 29, 33, 44, 52, 55]. In this paper we consider the time- 
varying coefficient model 

(1.1) yi = x7/3j + eu i = l,...,n, 

where y\ is the response, Xj is the predictor, T is the transpose operator, 
/3j = f3(i/n) for some smooth function (3: [0, 1] — > W and is the error. 
We assume that £ , (ej|xj) = 0. Estimation of the coefficient function (3(-) in 
model (1.1) has been considered by [6, 41, 45, 46, 56] among others. An 
important special example of (1.1) is the time- varying autoregressive model 
[13, 37] by letting Xj = (yi-i, . . . , yi- p ) T . There are important differences 
between our model (1.1) and those under longitudinal or functional setting. 
Here we assume that only one realization (xj,yj)™ =1 is available, while in the 
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longitudinal and functional setting, many subjects are measured at multiple 
times, so that one has multiple realizations. 

A natural problem for model (1.1) is to test whether certain or all com- 
ponents in /3 i are time-invariant. There is a huge literature on the problem 
of testing parameter stability; see, for example, [2, 5, 9, 14, 16, 18, 27, 31, 
32, 38, 40, 42, 53]. For model (1.1), we are interested in testing 

(1.2) H o :A0(-)=a 

for some vector agR s , where A € M sxp is a real- valued matrix. With an 
appropriately chosen A, the null hypothesis (1.2) can be formulated to test 
whether a certain part of coefficients is zero or time- invariant. In the latter 
case, a needs to be replaced by an estimate a. Zhou and Wu [56] built 
simultaneous confidence tubes for the regression coefficient function /3(-), 
which can be used as an £°°-test for (1.2). The latter test often does not 
have a good power if the alternative hypothesis consists of nonzero smooth 
functions. In Section 3.2 we propose a more powerful £ 2 -test which is based 
on the weighted integrated squared errors. Our setting is much more general 
than the one in [8] where (xj,ej) is assumed to be /^-mixing and stationary. 
In comparison, we allow nonstationary predictor and error processes which 
can be nonstrong mixing; see Section 2 for our nonstationary framework and 
basic assumptions. 

If some of the coefficients are time-invariant, model (1.1) becomes the 
(semiparametric) partially time-varying coefficient model 

(1.3) y i = xJ )ul f3 Dl +xJ )2ti (3 D2 (i/n) + ei, i = l,...,n, 

where D±,D2 C D* = {1, . . . ,p} are groups of parametric and nonparametric 
components, respectively. Based on an estimate of (1.1), we simply take an 
integration/average over the parametric part to obtain an estimate of j3 D 
that achieves the ?i 1//2 -convergence rate. An asymptotic theory is given 
in Section 3.1. This method was previously used in [55] for estimating 
state-domain semivarying coefficient models. The latter paper assumed that 
Ui = x^~/3(uj) + ej where (?/j,Xj,Uj), i = l,...,n, are independent and iden- 
tically distributed; see [51] for the case with stationary mixing processes. 
Our time-domain model (1.3) is very general, and it includes both (1.1) and 
usual linear regression models. Gao and Hawthorne [23] considered a special 
example of (1.3) with D\ = {2, . . . ,p}, D2 = 1 and X£> 2i j = 1, so that only 
the intercept term in (1.3) is time-varying. 

Section 3.3 deals with the problem of selecting significant predictors. Fan 
et al. [17] proposed an extended AIC for choosing locally significant vari- 
ables. Abramovich et al. [1] considered the problem of order selection for 
time-varying autoregressive models by requiring that multiple realizations 
are available. Using the dependence framework in Section 2, we are able to 
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solve this problem under parameter instability and temporal dependence. 
In particular, we propose an information criterion, consisting of measures of 
goodness-of-fit and model complexity, that can consistently select the true 
set of relevant predictors based only on one realization. Section 4 provides 
simulation studies and an application. Proofs are given in the Appendix. 

2. Model assumptions. Since the coefficient function (3(-) in (1.1) is 
smooth, we can naturally estimate it (along with its derivative) by 

n 

(2.1) (P(t),p'(t))= argmin ^{yi - Vo - X J Vl (i/n - t)} 2 K 

where K(-) is the kernel function, and b n is a bandwidth sequence sat- 
isfying b n —7- and nb n — > oo. Throughout the paper we assume that the 
kernel function K{-) is a symmetric and bounded function in C 1 [— 1, 1] with 
J_^K{v)dv = 1. For example, it can be the Epanechnikov kernel K(v) = 
3max(0, 1 — v 2 )/4 or the Bartlett kernel K(v) = max(0, 1 — \v\). Observe 
that (2.1) has the closed form solution 

(22) ( \ = (v ni0 (t) u M (t) 

where for I G {0,1,2}, 

n 

UnAt) = (nK)- 1 X)xixT{(i/n - t)/b n } l K{(i/n - t)/b n }, 

i=l 

with the convention that 0° = 1, and 

n 

Vn,l(t) = (nbn)- 1 ^x^{(i/n - t)/b n } l K{(i/n - t)/b n }. 
i=i 

To establish an asymptotic theory for /3(-), we need to impose appropriate 
regularity conditions on the covariates (xj) and errors (e.j). For testing the 
hypothesis (1.2), [8] assumed that (xj,ej) is /3-mixing and stationary. To 
allow nonstationary predictor and error processes that can be nonstrong 
mixing, we assume that 

(2.3) Xj = G(i/n; Ti) and = H(i/n;Fi), 

where J-{ = (. . . , £j) is a shift process of independent and identically 
distributed (i.i.d.) random variables Sk, k £ Z and G and H are measurable 
functions such that G(t;J-i) and H{t;J-i) are well defined for each t E [0, 1]. 
This setup is also used in [56]. 

For a random vector Z, we write Z G C q , q > 0, if ||Z|| 9 = {E(\Z\ q )} 1 / q < 
oo, where | • | is the Euclidean vector norm, and we denote || • || = || • ||2- 
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A process J(t;J-k) is said to be stochastically Lipschitz continuous (J € Lip 
in short) if there exists C > 0, such that ||J(ti;^fc) — J(t2',^Fk)\\ < C\t\ — t%\ 
holds uniformly for all ti, £2 £ [0, 1]. Then, under condition (A2) below, (2.3) 
defines locally stationary processes. Let be an i.i.d. copy of {ej}j^z 

and ^i : {o} = (■•■) E-ij^O) ei) ■ • ■ j^i) be the coupled shift process. We define 
the functional dependence measure 

00 

<W J ) = SU P P&Fk) - J(*;^*fe,{0})L and ©m,g(J) = yZhqi 1 )- 

te[o,i] i=m 

Let A(J,i) = ^2 k€Z cov{J(t;^Fo),J(t;^Fk)} be the long-run covariance ma- 
trix, and M(J,i) = E{3(t;^Fo)J(t;^Fo) T }. Under the short-range depen- 
dence condition 0o,2(J) < 00, both of them are uniformly bounded over 
t € [0, 1]. Let h(t;^Fk) = G^t-^^Hit;^^), and we shall make the following 
assumptions: 

(Al) Smoothness: (3 £ C 3 [0, 1]; 
(A2) Local stationarity: G, L € Lip; 

(A3) Short-range dependence: 60,4 (G) + 6o, t (L) < 00 for some 1 > 2; 
(A4) The smallest eigenvalue of M(G, •) is bounded away from zero on 
[0,1]. 

A sufficient condition for (A3) is that Oo,2t(G) + &o,2l(H) < 00 for some 

L>2. 

3. Main results. 

3.1. Parameter estimation. Let A be a pre-specified matrix and a = 
QA(3{t)dt. Then 

a= f A(3(t)dt 
Jo 

is an estimate of a. For the partially time-varying coefficient model (1.3), 
let A^j S ]RP lX P be a matrix with rows {zj}i^rj 1 , where Zj £ W is the 
vector with unit ith component and zeros elsewhere, then Arj 1 f3(t) = (3 D (i), 
t£ [0,1]. Although (3 Dl can be consistently estimated by Pi) 1 (t) for any 
t G (0, 1), the smoothed estimate 

j3 Dl = [ A Dl P(t)dt 
Jo 

can have a better rate of convergence. 

Theorem 3.1. Assume (Al)-(A4) and 6 n)t (L) = 0(n~ v ) for some v > 
l/2-l/i. Let S(t) =M(G,£)- 1 A(L,£)M(G,i)- 1 and k 2 = j\ v 2 K(v) dv. 
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If b n ^n c for some 1/6 < c < min{l/3, 1/2 — 1/(26)}, then 
n 1 /2(4_ a _£j => jv'jo,jr AH(t)A T dt| where £ n = b -^ A(3"{t)dt. 

In Theorem 3.1, the term £ n can be interpreted as the bias due to non- 
parametric estimation, and it vanishes under the null hypothesis (1.2). Hence 
the parametric component /3 D in the semi-parametric model (1.3) can have 
a n 1 / 2 -consistent estimate Prj x . 

3.2. Hypothesis testing. For testing the null hypothesis (1.2), let W(-) 
be a continuous mapping from [0, 1] to symmetric positive-definite matrices 
in M sxs . Consider the weighted integrated squared error 

(3.1) T n (A,a,W) = f {A(3(t) — a} T W(i){ A/3(i) — a} dt. 

Jo 

If a is unknown, an estimate can be used. For example we can use a = 
Jq A(3(t)dt, which has the parametric convergence rate; see Theorem 3.1. 
Chen and Hong [8] considered the special case that (xj, e») is a stationary j3- 
mixing process. Their generalized Hausman test [26] relates to (3.1) with A 
being the identity matrix and W(i) = M(G,i). Such a choice of weight ma- 
trices should be used if we are interested in prediction. Alternatively, we can 
use W(i) = I s xs) the identity matrix to form the integrated squared errors. 
Let K2 = f_ 1 K(v) 2 dv, by Theorem 1 in [56], A(3(t) has the asymptotic co- 

variance (n6 n ,)~ 1 A" 2 AH(t)A T . Hence, we can choose W(t) = {A3(f)A T }- 1 
to serve as a normalizer. In this case, (3.1) is (proportionally) an integral of 
the squared local t-statistics. 

For a matrix A, define p(A) = inf{|Av| : |v| = 1} and p(A) = sup{|Av| : 
|v| = 1}. Let 

rl-2\x\ 

K*(x) = J K(v)K(v + 2\x\)dv, 

and K2 = J_i K*(v) 2 dv. Since K E /C, we have K* £ C l [— 1, 1] and is sym- 
metric. Let 

S Al w(t) = W(t) 1 /2 A H(0A T W(t) 1 / 2 , 

(3.2) 

^A,W,l = tT^J E A)W (t) l dtX. 

Theorem 3.2 provides asymptotic normality for T n (A,a, W). 

Theorem 3.2. Assume (A1)-(A4), 6 ,4(L) < 00 and n , t (L) = 0{n~ v ) 
for some v > 1. If h n x n~ c for some 2/11 < c < min{l/3, 3/5 — 4/(5t), 
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2 - 4/l}, then 

(3.3) n&y 2 {T n (A,a, W) - (nb^K* (0)Z A ,W,l} iV(0, 4K 2 *~ AjWi2 ). 
If in addition a = a + O p (n~ 1 / 2 ), then (3.3) holds for T n (A,a,W). 

Let $(•) be the cumulative standard normal distribution function and 
qi- a be the corresponding (1 — a)th quantile. We reject the null hypothe- 
sis (1.2) at level a if 

(3.4) T n (A,a, W) > b~V 2 K* (0)E A , Wi i + n- 1 6- 1 / 2 (4K 2 *H AiWi2 ) 1/2 g 1 _ a . 

Let f : [0, 1] — > W be of class C 3 , and {d n } be a sequence of nonnegative real 
numbers. Proposition 3.1 provides the asymptotic power of the test (3.4) 
under the local alternative 

(3.5) A(3(t) = a + d n i(t). 

Proposition 3.1. Assume conditions of Theorem 3.2. If nbl/ 2 '(i 2 — )• 
s > 0, then the power of the test (3.4) satisfies 

sjim T ^mt)dt } 

Qa+ (4^H AiW , 2 ) 1 / 2 y 

3.3. Variable selection. In this section we shall propose an information 
criterion for time-varying coefficient models that can consistently identify 
the true set of relevant predictors. Recall that D* = {1, . . . ,p} is the whole 
set of potential predictors, and (3(-) is the coefficient estimate. Let Dq be 
the true set of relevant predictors. For a candidate subset D C D* , we can 
compute the variable selection information criterion 

VIC(£>)=log{RSS(£>)} + x n |£>| 

(3.7) 

n 

where RSS(D) = ]T{ yi - xj ^ D (i/n)} 2 . 

i=l 

Here Xn is a tuning parameter. We select a subset D that minimizes VIC(-D), 
thus balancing goodess-of-fit and model complexity. Smaller Xn leads to 
more predictors, and vice versa. Theorem 3.3 provides theoretical properties 
of our procedure. 

Theorem 3.3. Assume (Al)-(A4), 6 ,4(L) + @ ,2( H ) < oo, n , t ( L ) = 
0{tT v ) for some u > 1/2- Let ip n = (n6 n )" 1 {n 1 / t + (nb n logn) 1/2 } + b\ 
and p n = n _1/ ' 2 6~ 1 + b n . If b n x n~ c for some 0<c<l — 1/l, 

(3.8) and {ip n (ip n + p n )}~ Xn -> oo, 

then, for any D ^ D , pr{VIC( J D) > VIC(A))} -»• 1. 



(3.6) Power -»• $ 
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4. Implementation. 

4.1. Covariance matrix estimation. Theorems 3.1 and 3.2 both involve un- 
known quantities depending on the covariance matrices M(G, t) and A(L, t), 
t G [0,1]. The problem of estimating covariance matrices has been exten- 
sively studied; see [3, 36, 39] among others. Let w n , r n and g n be band- 
width sequences satisfying w n — > 0, r n — > 0, g n — > and nT n g n — > oo. Let 
= [0,r n £» n ], X n>2 = (r n g n , 1 - T n £ n ), X ni3 = [1 - T n g n , 1] and 



L i LT + 2L i ^LTl 



Ai(L,r n ^ n ) 



3=1 



LjL,7 + 2Lj^Ljl{ 0< j/ n _ J y n < Tn£ , n }, 
j'=i 

For rj G [0, 1], we estimate M(G,i) and A(L,t), respectively, by 

n 

M(G,t) = ^x 4 x7^, ron (t) 



if i/n GX„ i2 ; 
if i/n GX n , 3 . 



i=l 



and 



a/t ,x _ ^i(L,r n £< n ) + Ai(L,r n £< n ) T 
A l Ll ) t J — 2^ o ^i,T„{t), 



i=l 



where Wi , 6 (t) = iffti/n - t)/6}{P b)2 (i) - (t - i/n)P b ^{t)} / {P b , 2 {t)P bfi {t) - 
Pb,i(t) 2 } are local linear weights with bandwidth b and Pb,i(t) = Sj=i(* ~~ 
j /n) 1 K{(j /n — t)/b}. Proposition 4.1 provides consistency of our covariance 
matrix estimates. 



Proposition 4.1. Assume (A2), 6o,4(G) + 9o,4(L) < oo and 9 nj2 (L) : 
0(n~ u ) for some v > 0. If both M(G, •) and A(L, •) are in class C 2 , then 



(4.1) 



and 



sup ||M(G,t) -M(G,i)|| = 0{{nw n y 1 ' 2 + w 2 n ], 
te[o,i] 



(4.2) sup \\A(L,t)-A(L^=0{ g y 2 + (nr n Q n )- v + (T n Q n y/^+r 2 }. 
te[o,i] 

The bound in (4.1) is optimized and becomes 0(n~ 2 / 5 ) if w n >c n -1 / 5 . The 
optimal bound in (4.2) is complicated and it depends on v, the decay rate 
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of dependence. In particular, if v > 2/3, then the optimal bound in (4.2) 
is 0{n- 2u /^+ 2 ^} if T n ^n~ u l^ v+ V and g n x n" 4l/ /( 5ly+2 ) ; otherwise it is 
0{n"^+ 2 )} if r n x n -(i-")/("+2) and Qn ~ n -2u/(v+2)^ QT Tn x n ~u/(2u+4) 

and Q n x n" 1 / 2 . In computing A(L,i), since (e^) is usually unknown, we 
shall replace it by (e^), the estimated local linear residuals. 

4.2. A simulation- assisted testing procedure. By the sandwich formula, 
let S(t) = M(G,t)- 1 A(L,t)M(G,t)- 1 and, as in (3.2), correspondingly de- 
fine Ha,wW and Sa,w,;- By (3.4), we reject the null hypothesis (1.2) at 
level a if 

(4.3) A n (A,a,W) = — — 3 — > qx-a- 

(4K*= AiW ,2) 1/2 

If a is known, then in (4.3) we can use a instead of a. The criterion (4.3) 
usually does not have a good performance because of the slow convergence 
in (3.3). Note that the statistic A n (A,a, W) is asymptotically pivotal, so 
we propose a simulation-assisted testing procedure that can substantially 
improve the finite-sample performance. In particular, we generate i.i.d. stan- 
dard normal random variables y°, i = 1, . . . , n, and i.i.d. standard multivari- 
ate normal random vectors x°, i = 1, . . . , n, that are also independent of 
We compute the corresponding A° (A, a, W), and repeat this for many times 
to obtain its empirical quantile qi- a . We reject the null hypothesis (1.2) at 
level a if A n (A, a, W) > qi— a . Our procedure has a similar flavor as the 
Wilks type of phenomenon discussed in [18] . A major difference is that we 
allow dependent and nonstationary errors. 

4.3. Bandwidth selection. Bandwidth selection for nonparametric hy- 
pothesis testing is a nontrivial problem, and it has been studied in [15, 
21, 22, 30] among many others. As commented by Wang [48], there exists 
no uniform guidance for an optimal choice. On the positive side, our sim- 
ulation results in Section 4.6 indicate that the empirical acceptance prob- 
abilities are not quite sensitive to the choice of bandwidth. Hence one can 
simply choose b n = n _1//5 that has the asymptotic mean integrated squared 
error (AMISE) optimal rate. As an alternative, we consider the generalized 
cross-validation (GCV) selector by [10], and estimate the covariance ma- 
trix T n = {E(eiej)}i<ij< n to correct for dependence [49]. Specifically, let 
Y = (yi, . . . , y n ) T ] then for any bandwidth b € (0, 1), one can write the local 
linear smoothed fitted values as Y(6) = H(6)Y, where H(6) is the corre- 
sponding hat matrix. We choose the bandwidth b n that minimizes 

U A) GCV(b) - "-HYW-Yrf-HYW-Y} 

(4.4) GCV(6)- ———— 2 . 
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An estimate of the covariance matrix T n can be obtained by using the band- 
ing technique as in [4, 50]. The GCV selector (4.4) works reasonably well in 
our simulation studies. 

We shall now provide data-driven choices of vj n , T n and g n in the es- 
timation of covariance matrices. From the construction in Section 4.1, we 
truncate the long-run covariance matrix estimate at lag m n = nT n g n and, 
by the proof of Theorem 3.1, 

-k 

N(0,a 2 k ), 



- y~] LjL i+k - tr / cov{L(i;:Fo),L(t;:F fc )}cft 



where a\ is the integrated long-run variance of the process {Lj^i+k}™^ ■ 
We propose to choose rh n = max{/c > 0: jn" 1 / 2 Y17=i ^J^i+k] > 1-96(7^}. 
Note that the final estimate is a local linear smoother of {Aj(L, rh n /n) + 
Aj(L,m n /n) T }/2, i = l,...,n, and we can apply the GCV method to se- 
lect f n . The latter can also be applied to XjX^~, i = l,...,n, to select w n , 
and we take w n = max(ro n , re" 1 / 5 ) to avoid numerical singularities. These 
data-driven choices of bandwidths are able to capture dependence and non- 
stationarity and have a good performance in our simulation studies. 

For the information criterion (3.7), if l>5/2 and b n x n" 1 / 5 , then (3.8) 
becomes 

(4.5) and {n~ 3//5 (logn) 1/2 } _1 Xn -> oo. 

Note that condition (4.5) is more restrictive than the traditional Bayesian in- 
formation criterion (BIC) because of parameter instability. Under the latter 
setting, a heavier penalty on model complexity is usually needed to suppress 
the over-fitting problem; see [51] for a similar finding on cross-validation 
methods. As a rule of thumb, we suggest using x n = ra -2 / 5 . This simple 
choice performs reasonably well as can be seen from our simulation study. 
The choice of bandwidth becomes further complicated due to model uncer- 
tainty. We suggest a two-stage selection procedure: let b n be the selected 
bandwidth by GCV with all available predictors, and we use the informa- 
tion criterion (3.7) to select a pilot set of relevant predictors; then we select 
the bandwidth b n by applying the GCV method to this pilot set. 

4.4. Locally stationary autoregressive processes. Modeling a nonstation- 
ary process by autoregressive models with time-varying coefficients has at- 
tracted considerable attention. A traditional approach is to project the co- 
efficient function onto a basis of temporal functions, and estimates the basis 
coefficients; see, for example, [25, 43, 47]. Other contributions on parameter 
estimation can be found in [12, 13, 24, 37] among others. Abramovich et 
al. [1] considered the problem of order selection by requiring multiple real- 
izations. Let ai (■),..., o p (•) be continuous functions. We shall prove that the 
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time-varying autoregressive process 

(4.6) yi = a\{i/ri)yi-\ H h a p (i/n)yi_ p + a, i = l,...,n, 

has an approximate solution of form (2.3), and the difference is of a negligible 
order. Hence the results in Section 3 can be directly applied to address the 
problem of parameter estimation, hypothesis testing and order selection for 
time- varying autoregressive models. 

Recall that e% = H{i/n;!Fi). Let Xj = (y^, . 



A(t) 



and 



f ai {t) 
1 

V o 



H°(i;:Ffc 



a p -i(t) 


1 



■ , Vi-p+l 
a p (t)\ 




G 



3Xp 



o / 



(H(t-T k )\ 




V 







/ 



p, are Lipschitz continuous 



Then (4.6) can be written as 

(4.7) x, = A(i/n)x i _ 1 + H»(i/n; J=i). 

We shall make the following assumptions: 

(Tl) The starting point (y p , . . . , yi) T G C? . 
(T2) The coefficient functions aj(-), j 
on [0,1]. 

(T3) Yfj=i aj(t)z j / 1 for all \z\ < 1 + c with c> uniformly in £ G [0, 1]. 

Conditions (T2) and (T3) entail local stationarity and short-range depen- 
dence, respectively; see also [11]. Proposition 4.2 states that the autoregres- 
sive process (4.7) can be approximated by (2.3) with a uniform approxima- 
tion error of order O p {n~~ l ). 

Proposition 4.2. Assume (T1)-(T3). If H £ Lip, then there exists 
a measurable function G G Lip and a constant C > such that 

(4.8) max ||xi - G(i/n;^)|| < Cn" 1 . 

l<i<n 

In proving asymptotic results of Section 3, the key quantity is the partial 
sum process X^i=i x « x 7 an d X^i=i x « e «; ^ = l,---,n. By Proposition 4.2, 
there exists a measurable function G G Lip such that 

k 



max 

Kk<n 



^{ Xi x7 - G(i/n;^)G(i/n;^) T } 



i=l 



o P (i) 
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and 



max 

Kk<n 



^{xj - G(i/n;Fi)}e,i 



O p (l). 



A careful check of the proofs of our results in Section 3 indicates that, due 
to the above relation, they are still valid for the time-varying autoregressive 
process (4.6). 

4.5. A comparison with GLRT. The generalized likelihood ratio test 
(GLRT, [18]) is a popular method for nonparametric hypothesis testing. 
It was used by Cai, Fan and Li [7] for testing the coefficient constancy, and 
generalized by Fan and Huang [16] to semiparametric models. Properties of 
GLRT have been extensively studied for i.i.d. samples. However, its validity 
for dependent data is not guaranteed. We shall here briefly review the GLRT 
and compare it with our method. For the null hypothesis 

H :/3(-)=/3 

for some vector (3 £ MP, the GLRT statistic is defined as 

n, RSS _n 1 £™=i{^ " x 7^} 2 



TgLR = 7T lo § 777TFT" = 7T lo S 



2 a RSS 1 2 °YZ=i{yi-xJP(i/n)r 

where /3 is the least squares estimate. To construct the null distribution 
of Tglr, we use the conditional bootstrap as suggested by [7, 16]. Let 
a 2 = n _1 RSSi and {e°}i£z be i.i.d. N(0,a 2 ). We generate the bootstrap 
sample yf = + ef, i = 1, . . . , n, and compute the test statistic 7g LR . We 
approximate the distribution of Tqlr by that of Tq LR . 

Consider the AR-ARCH process with time-varying conditional variance 

Vi = 0.5 yi _i + 0.25[1 + {1 + exp(3 - 61/71)}'%, 
ei = (l + 0.25eti) 1/2 ^, 

where {ejjigz are i.i.d. A r (0, 1). Let n = 500 and the bandwidth b n = n -1 / 5 = 
0.289. We consider testing whether the coefficient of y%-\ is a constant. For 
A n (A,a, W), we use the identity weights W = I pX p and obtain its cut-off 
value by the simulation- assisted procedure in Section 4.2 with 5000 simu- 
lated A°(A,a, W). For Tqlr, the cut-off value is obtained by 5000 boot- 
strapped Tq LR . We generate 5000 realizations of the AR-ARCH process and 
use Q-Q plots to examine the performance. The results are presented in Fig- 
ure 1. It shows that the GLRT fails to provide valid p- values in the presence 
of dependence and nonstationarity. For example, the empirical acceptance 
probabilities are 79%, 86.4% and 94.7% for the 90%, 95% and 99% nominal 



12 



T. ZHANG AND W. B. WU 



Empirical levels of Tg LR Empirical levels of A° 

0.01 0.98 1.00 1.00 1.00 0.45 0.98 1.00 1.00 1.00 




Quantiles of Tq LR Quantiles of A° 

(a) (b) 

Fig. 1. A comparison of the GLRT (a) with our dependence- adjusted testing proce- 
dure (b). Q-Q plots o/Tglr against Tq LR (a) and A„(A, a, W) against A° (A, a, W) (b). 
The dashed lines in (a) and (b) have unit slope and zero intercept. 



levels, respectively. As shown in Figure 1(b), our dependence-adjusted pro- 
cedure provides a satisfactory approximation of A n (A, a, W). At 90%, 95% 
and 99% nominal levels, our empirical acceptance probabilities are 89.5%, 
95.0% and 98.7%, respectively. 

4.6. Simulation studies. We shall here carry out a simulation study to 
examine the finite-sample performance of our hypothesis testing procedure 
in Section 3.2 and the information criterion for variable selection in Sec- 
tion 3.3. Let Pj(t) be the jth order Legendre polynomial and P(t) £ M 5x5 
be the diagonal matrix with jth diagonal component Pj(2t — l)/4. Let 
£fc = (£fc,i, ■ • ■ ; £fc,6) T ? k G Z, be i.i.d. Rademacher random variables and 
Mo = (6.2\ i -'\) 1 <ij< s . Then £ k = M°(e M , . . . , e fcj5 ) T , k G Z, forms a se- 
quence of independent random vectors with correlated components. Let 
x i = !>2'jLo F ( i / n ) j €i-j and e i = Eilo- P 6(V n ) ie i-i,6- Consider: 

(i) a linear model with heteroscedastic errors: for i = 1, . . . ,n, 

Vi = (2i/n - l) 2 + 2xi,i + 2 log(i/n + l)aj ij2 + 0.5(x 2 2 + x 2 3 ) 1/2 e i; 

(ii) a linear model with autoregressive effects: for i = 1, . . . , n, 

Ui = 0.4sin(27u/n)yj„i + 0.3x^1 + 0.4(2i/n - l) 3 x ii2 + exp(0.5i/n - 2)e i)6 . 

Let n = 500 and the bandwidth b = 0.1k, k = 1, . . . , 9. We use the information 
criterion (3.7) to estimate the set of relevant predictors. For model (ii), the 
whole set of potential predictors is taken to be (xj) along with three lags of 
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Table 1 



Percentag 


es of under-fitted, correctly fitted 


and over-fitted models 


selected by the variable 


selection ', 


information criterion (3.7) for n 


— 500. Medians of the 


selected bandwidths are 


MO 


= 0.25 and b„(ii) =0.18 for models (i) and (ii), respectively, and c = 2/3 






VIC 




b 


Under-fitted 


Correctly fitted 


Over-fitted 


Model (i) 








0.1 


0.0 


100.0 


0.0 


cb n (i) 


0.0 


100.0 


0.0 


0.2 


0.0 


100.0 


0.0 


MO 


n n 

u.u 


i nn n 


n n 

u . u 


0.3 


n n 

u.u 


i nn n 


n n 

u.u 


Mi)/ C 


U.U 


1UU.U 


n n 
U.U 


0.4 


U.U 


i nn n 
1UU.U 


U.U 


0.5 


0.0 


100.0 


0.0 


0.6 


0.0 


100.0 


0.0 


0.7 


0.0 


100.0 


0.0 


0.8 


0.0 


100.0 


0.0 


n o 


0.0 


100.0 


0.0 


Model (ii) 








0.1 


0.0 


99.8 


0.2 


c6„(ii) 


0.0 


100.0 


0.0 


&n (ii) 


0.0 


100.0 


0.0 


0.2 


0.0 


100.0 


0.0 


Mii)/ C 


0.0 


100.0 


0.0 


0.3 


0.0 


100.0 


0.0 


0.4 


0.0 


100.0 


0.0 


0.5 


0.0 


100.0 


0.0 


0.6 


0.0 


100.0 


0.0 


0.7 


0.2 


99.8 


0.0 


0.8 


1.1 


98.9 


0.0 


0.9 


3.3 


96.7 


0.0 



the response variable. A realization is categorized as under-fitting if we miss 
at least one relevant predictor, and over-fitting if the selected set contains at 
least one irrelevant predictor without under-fitting. The results are summa- 
rized in Table 1 based on 5000 realizations. Given models (i) and (ii), we use 
the hypothesis testing procedure to test whether (scji) has time-invariant 
contributions. Three types of weight matrices are used: the identity weights 
Wi(t) = I s xsi the normalizer weights W2(i) = {AH(t)A T }~ 1 and the pre- 
diction weights ~Ws(t) = AM(G,t)A T . For each configuration, we use the 
simulation-assisted hypothesis testing procedure in Section 4.2 to obtain cut- 
off values go.9o and 50.95 with 5000 simulated A° (A, a, W). We then generate 
5000 realizations of both models (i) and (ii), and calculate the corresponding 
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Table 2 

Empirical acceptance probabilities (in percentage) of the simulation-assisted hypothesis 
testing procedure in Section 4-2 for n = 500. Medians of the selected bandwidths are 
&n(i)=0.25 and b„(ii) = 0.18 for models (i) and (ii), respectively, andc = 2/3 



Wi(t) W 2 (t) W 3 (t) 



b 


90% 


95% 


90% 


95% 


90% 


95% 


Model (i) 














0.1 


92.7 


96.9 


93.2 


97.1 


92.4 


96.5 


cb n (i) 


92.2 


96.5 


92.9 


96.7 


92.0 


96.4 


0.2 


91.7 


96.0 


92.4 


96.2 


91.7 


95.9 


MO 


90.9 


95.8 


91.4 


96.4 


90.3 


95.5 


0.3 


90.3 


95.2 


90.9 


95.7 


90.0 


94.8 


b n (i)/c 


90.5 


95.1 


90.9 


95.8 


90.3 


95.1 


0.4 


90.9 


95.5 


91.7 


96.1 


91.1 


95.5 


0.5 


90.4 


94.8 


91.3 


95.2 


90.4 


95.0 


0.6 


90.7 


95.3 


91.2 


95.5 


90.7 


95.2 


0.7 


91.6 


95.6 


91.7 


95.7 


91.5 


95.7 


0.8 


89.1 


94.9 


89.0 


94.8 


89.0 


94.8 


0.9 


89.9 


95.0 


90.4 


95.4 


89.8 


94.9 


Model (ii) 














0.1 


93.1 


97.0 


93.2 


97.0 


92.9 


96.9 


cb n (ii) 


92.6 


96.6 


92.2 


96.1 


92.1 


96.2 


b n (ii) 


91.1 


96.1 


91.0 


95.5 


90.7 


95.8 


0.2 


91.3 


96.3 


91.2 


96.2 


91.0 


96.0 


b n {n)/c 


90.9 


95.4 


90.8 


95.7 


90.7 


95.2 


0.3 


90.8 


95.1 


89.9 


94.8 


90.4 


95.1 


0.4 


90.0 


95.1 


89.8 


94.9 


89.9 


95.1 


0.5 


89.5 


94.6 


88.2 


93.9 


89.2 


94.5 


0.6 


88.6 


93.6 


87.7 


93.1 


88.5 


93.6 


0.7 


88.6 


93.8 


87.4 


93.6 


88.1 


93.8 


0.8 


87.4 


92.6 


87.0 


92.2 


87.4 


92.7 


0.9 


88.3 


94.4 


88.1 


94.0 


88.2 


94.3 



test statistic A n (A,a,W). Empirical acceptance probabilities are reported 
in Table 2. 

It can be seen that the empirical acceptance probabilities are fairly close 
to their nominal levels (90% and 95%), and the information criterion (3.7) 
performs quite well. In addition, the results are not sensitive to choices of 
weight matrices and bandwidths. For models (i) and (ii), medians of the 
selected bandwidths based on the GCV criterion (4.4) are b n (i) = 0.25 and 
6 n (ii) = 0.18, respectively. Observe that for model (i), the performance is 
also quite satisfactory if we choose bandwidths 0.25c, 0.25 and 0.25/c with 
c = 2/3. A similar claim can be made for model (ii) as well. 
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4.7. A real-data example. We apply our model selection method to the 
Hong Kong circulatory and respiratory data which contains daily measure- 
ments of pollutants and hospital admissions in Hong Kong between Jan- 
uary 1, 1994 and December 31, 1995 (n = 730). Four pollutants, sulphur 
dioxide (in ug;/m 3 ), nitrogen dioxide (in ug/m 3 ), dust (in ug/m 3 ) and ozone 
(in ug/m 3 ), are considered here. The purpose is to understand the associ- 
ation between daily hospital admission (yj) and levels of sulphur dioxide 
(xi,2)i nitrogen dioxide (2^,3), dust (2^,4) and ozone (2^5). Figure 2 provides 
their time series plots. In the analysis, we regularize the data so that each 




300 400 

Time (days) 




£j 100 




O 40 



300 400 500 

Time (days) 




100 200 300 400 500 600 700 

Time (days) 



100 200 300 400 500 600 700 

Time (days) 



Fig. 2. Time series plots for daily hospital admission (top) and levels of sulphur dioxide 
(middle left), nitrogen dioxide (middle right), dust (bottom left) and ozone (bottom right) 
from January 1, 1994 to December 31, 1995. 
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Table 3 

Summary of test statistics and corresponding p-values for testing parameter constancy 
with 5000 simulated A°(A,a,W) 







Wi(t) 




Wa(t) 




Wj(t) 


A n 


p- value 


A„ 


p- value 


A n 


p- value 


MO 


69.77 


0.00 


120.77 


0.02 


69.77 


0.00 


&(•) 


6.88 


0.14 


12.47 


0.19 


6.85 


0.15 


&(•) 


16.27 


0.02 


30.13 


0.09 


23.06 


0.01 



variable has zero mean and unit variance. Letting xn = 1 be the intercept, 
we consider the time-varying coefficient model 

5 

(4.9) yj = Pi{i/n) + Pj(i/n)x it j + ej, i = l,...,n. 

3=2 

The dataset has been studied by [7, 19, 20] by assuming that the observa- 
tions are i.i.d., while Zhou and Wu [56] found substantial dependence among 
the fitted residuals. We shall here model the process by (2.3) and apply our 
model selection method in Section 3. The selected bandwidth and tuning 
parameter are b n = 0.13 and Xn = 0.072, respectively. The information crite- 
rion (3.7) selects the intercept (^j,i), nitrogen dioxide (^3) and dust (2^,4) 
as relevant predictors. Fan and Zhang [20] did not consider the ozone effect 
(xifi) and concluded that sulphur dioxide (2^2) is not statistically significant. 
We then apply the hypothesis testing procedure in Section 4.2 to examine 
whether the selected variables really have time-varying contributions. With 
5000 simulated A°(A,a, W), the results are summarized in Table 3. Hence, 
at 10% significance level, we conclude that /?i(-) and are time-varying 
while /3z(-) can be treated as time-invariant, suggesting the model 

(4.10) yi = (3 1 (i/n) + /3 3 x i)3 +p i {i/n)x i> i + e», i = l,...,n, 
where $3 = L ^s(t)dt = 0.15, and Pi(-) and /?4(-) are plotted in Figure 3. 

APPENDIX 

For a, b € R, write a A b = min(a, b) and a V b = max(a, b). For a matrix A, 
recall that p(A) = inf{|Av| : |v| = 1} and p(A) = sup{|Av| : |v| = 1}. The 
proofs of the following two propositions are straightforward, and the details 
are omitted. 

Proposition A.l. Let A = (aij)i<i</,i<j<j be a real matrix. Then 
(i) maxjj \aij\ <p(A) < \[Tj maxjj \a%j\; (ii) //B has same dimension as A, 
then 7J(A + B) < p(A) + p(B); (iii) // B = (6j fc )i<,-<j,i< fc <A- G M JxK , then 
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Fig. 3. Plots of estimated coefficient functions, (left) and Pn(-) (right). 

p(AB) < p(A)p(B) and p(AB) > p(A)p(B); and (iv) p(aa T ) = |a| 2 for any 
column vector a. 

Proposition A. 2. Assume that A is a nonsingular square matrix and 
that E is a matrix with the same dimension. 7/p(A _1 E) < 1, then A + E is 
nonsingular and p{(A + E)- 1 - A" 1 } < p(E)p(A~ 1 ) 2 /{l - ^(A^E)}. 

Let J~i j = (ej, . . . , £j), i < j. Define the projection operator 

V k -=E(-\F k )-E(-\F k - 1 ), keZ. 

Let "&k(t) = J(i; y~k) be a zero mean process. Write tj jn = i/n, i = 1, . . . ,n. 
Lemmas A.l and A. 2 provide £ 9 -bounds for linear and quadratic forms of 
{$fc(ifc,n)}fc=n respectively. To prove Theorem 3.1, we need Lemmas A.l 
and A. 3. 

Lemma A.l. Assume 6o,g(J) < oo, q > 1. Write q' = q A 2. Lei 
{A*. n(t)}t =1 , t € [0,1], 6e a sequence of real matrix functions, and define 
S n(i) = YJk=i Ak,n(t)&k{tk,n)- Then: 

W ||S„(f)||, < C q [YJU |p{A,, n (t)}|^'] 1/9 '©o, g (J); 
(ii) ||sup te[0)1] |S n (t)|||, <C q n l ^An@ Q , q {3), 

where An = sup t6 [ 0jl ] [p{A 1>n (t}} + X)fc=i p{Afe+i,„(t) - A k;V ,(t)}]. 

Proof. Let D fc) j, n = E{&k{tk,n)\^k-i,k} - E{& k (tk,n)\Fk-i+i,k\- Then 
&k(tk,n) ~ E l & k {h,n) = YaZo D M,n and D k ,i, n , k = l,...,n, form martingale 
differences. By the Burkholder and the Minkowski inequalities, we have 

n 

y~] Afc jn (t)Dfc i / jn 
fc=i 

Since ||D M , n || 9 = ||^{^(*fc,n)l^o,i}--E{^,{o}(*fc,n)l^o,j}||, <M J )> W &1- 
lows. We now prove (ii). By Doob's inequality and the summation by parts 



T 0.5 

o 0.0 
cd 

.1 -0.5 



-1.0 



k=i 



ID 



k,Ln 
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formula, we have || sup te[0)1] | YJk=i A fc,n(*) D M,nl \\ q < C q A n n x ^ fy 9 (J), en- 
tailing (ii). □ 

Lemma A. 2. Assume 0o,2g(J) < oo, q>2. Let {Q,ij t n}i<i<j<n be real 
matrices and L n = Si<j<j<n^(^,n) T Q«,j,n' l9 i( t j,n)- Then 

\\L n - E{L n )\\ q < C q n l l 2 Q n Q 0t2q (3)\ 
where Q 2 n = (max* Y!j=i+i Ip(QwO| 2 ) V (maxj Yj'iZl |p(Qi,i,n)| 2 )- 

Proof. Let $fc(i) = E{-dk(t)\^Fk-m,k} be the m-dependent approxi- 
mated process and L n be the corresponding quadratic form. If I > 2m, 
V j ^i{'di(ti !n ) T x Qi,j,J>j{tj, n )} = 0. Hence 

2m n j—1 

\\L n - E(L n )\\ q < S^'-'S^^.") 1 "^'."^^.") 

1=0 j=2 i=l 

where 

n /j-l-l j-i \ 



3=2 



i=l i=j—U 



< C q nQil 2 9 , 2q (jy 



By Lemma A.l and the arguments of Proposition 1 in [34], we have \\{L n — 
E(L n )} - {L n - E(L n )}\\ q < C q ^Q n @ , 2q (J) 2 . So Lemma A.2 follows. □ 

Lemma A. 3. Assume sup t6 r 0) i] ||J(t;^o)||<. < °°> t > 2, and 6 n ,t(J) = 
0{n~ u ) for some v > 1/2-1/t. LetS Kin {t) = {nb n )~ l Y2=i K {( k / n -^)/ h n} ■ 
^k{tk,n)- Then 

(A.l) sup |S K , re (t)|=^4^ W / iere ^ n = n 1 / t + (n6 n logn) 1 /2. 

6„<i<l-6 n "-On 

Proof. Let S* = rao^sup^^^ |S A> (t)|. By Theorem 2(h) in [35], 
there exist constants C\, C 2 > such that, for all A > 1 and I, 

/ 1+3 
i=l 



pr max 

\ 0<j<nb r 



(A.2) 



> Xv r , 



nb n 



<Ci T r^-+2e W {-(Xv n ) 2 /(nb n C 2 )}. 



Note that [b n ,l — b n ] C Uj<i /6„b°«' + l)°n]- Using the summation by parts 
formula, since -fT has support [—1, 1], we have (A.l) in view of (A.2) and 

nb„ 



MS* n >\v n ) = 0(b- 1 ) 
by choosing a sufficiently large A. □ 



Ci-- A- + 2exp{-(\v n ) 2 /(nb n C 2 )} 
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For I £ {0,1,2}, let 

n 

Rn,l(t) = Kj-^Xie^i/n - t)/b n } l K{(i/n - t)/b n }. 

i=l 

PROOF of Theorem 3.1. By Lemma A.l, we have 
1 

AM(G,t)- 1 R„ j0 (t)(it 

1 n ( 
-^AM(G,!/n)- 1 x 1 e ! + 0J 

n i=i ^ 



o 



n n 



By m-dependence approximation, under Conditions (A2), (A3) and (A4), 
we obtain 



^AM(G,i/n)" 1 x ! e i => N {°> J q AH(t)A T dt|. 



By Lemmas A.l and A. 3, and the argument in the proof of Theorem 3 in [56], 
we have sup tg[01 ] |/3(i) - /3(f) | = O p (ip n ) and 

sup \M(G,t){P(t) -p{t) - 2-^2(3" (t)} -Rn, (t)l 

b n <t<l—b n 

(A.3) " 

= Op((p n p n J. 

Therefore, 



a — a 



£ n = / AM(G,t)- 1 R ni o(t)dt + O p (^„Pn + 6n^n + 6n)- 
JO 



Under our bandwidth conditions, V9 n ,/? n + b n (p n + 6^ = o(n x / 2 ). So Theo- 
rem 3.1 follows. □ 

Let 7fc,2(J) = ^,2(J)<5j+|fc|,2(J)- Lemma A. 4 provides continuity prop- 
erties of long-run covariance matrices for stochastically Lipschitz continuous 
processes. 

Lemma A. 4. Assume J € Lip and ©o,2(J) < oo. Then: (i) for any non- 
negative sequence a„-»0, sup| tl _ t2 |< an p{A(J, tt) - A(J, t 2 )} = o(l); (ii) if, 
in addition, G n ,2(J) = 0{n~ u ) for some v > 0, then ^P\ti-t 2 \<a n P{A(J, *i) ~~ 
A(J,i 2 )} = 0{aJ^ 1+I/ '}; and (hi) i/inf tg [ 01 ] p{A(J, t)} > 0, i/ien (i) and (ii) 
hold for the inverse A~ 1 (J,t). 
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PROOF. We first observe that 

p[Emti)Mt2) T }} < Eii^(*i)ini^i(*2)ii 

< ^^_ S! 2(J)5j_ S!2 (J)=7b-i| ) 2(J)- 

The Lipschitz continuity implies 

(A.4) p(E[Mh){Mt2) -^(*i)} T ]) < C{ % _ jl2 {3) A |*2 — ti|> 
uniformly. Hence 

sup p{A(J,ti) - A(J,t 2 )} < C^{7 fc , 2 (J)Ao n }, 

1*1— *2|<Qn 



which entails (i) by the dominated convergence theorem. Let r n = a n 1 '( 1+u ' 
which goes to infinity as n — > oo. Since 7fc,2(J) < ©z,2(J)©o,2(J), we 

have X]fc^=o{^.2(J) Aa n } = 0(r n a n + r~ u ), (ii) follows. Then (hi) follows by 
Proposition A. 2. □ 

Let Wo(-) be a continuous mapping from [0, 1] to symmetric matrices 
in W x p. For I G {1,2}, define A Wojl = tv[f^ {W (t)A(L,t)} 1 dt\. Before we 
prove Theorem 3.2, we shall first establish a parallel result for 

(A.5) r^(W )= / 1 R^o(t) T W (t)R fh0 (t)<ft. 

JO 

Let ri )Tl = (nbn)' 1 Efcto{7fc,2(L) A 6 n } and r 2 , n = (nbn)" 1 ^2^ =0 [{k / (nb n )} A 
l]7fc, 2 (L). 

Lemma A.5. Assume L G Lip and 0o,4(L) < oo. If b n — > and n6 n — >• 
oo, i/ien 

(A.6) nb l J 2 [T«(Vf ) - E{T:(W q )}] N(0, 4K 2 *Aw , 2 ), 

and 

(A.7) S{r n (Wo)} = (n6 n )- 1 K*(0)A Wo ,i + 0(n,n + r 2 , n ) + o^" 1 ^ 1 / 2 ). 

Proof. Let C fe (*) = L(t; ^) and C fe (i) = #{Ck(*)l ^fc-m.fe} be its in- 
dependent counterpart and A(L, t) be the corresponding long-run covari- 
ance matrix. Then A(L,t) — > A(L,t) uniformly as m — > oo. Let u^ n (*.) — 
(ubn)" 1 xK{(k/n-t)/b n }, k = l,...,n, and Qij, n = Jq w itn (t)W (t)wj in (t) dt. 
The central limit theorem (A.6) is a multivariate generalization of Theo- 
rem Al in [54] by using Propositions A.l and A. 2. We shall only detail steps 
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that require special attention on the dimensionality. Essentially, we need to 
show that 



lim lim 

m— ¥00 n— >oo 



n j—2m 

2 b n £ ^{DSQij,n^(D* n D*];)Q4 n D*„} 



n 

j=2m+l i=l 



2 A W ,2, 

where D£ n = 7> fe ££ C fc +i(<fc,«)- Since s ( 6 jU 6 £) = A(L,t fcjTl ), by Lem- 
ma A. 4, we have 

n j—2m 

£ £ S{D*T QiJ . re £;(D* n D*T)QT. n D* n } 

j=2m+l i=l 

= E tr[{W (^ n )A(L,^ n )} 2 ](Y w i>n (t) Wj , n (t)dt 

l<i<j<n 

+ o{n 2 6„(n 2 6 n )~ 2 } + 0{^(m)n 2 & n (n 2 6 n )^ 2 } + 0{mn(n 2 6 n )- 2 } 

for some function £{m) — > as m — >• oo. Then (A. 6) follows. For (A. 7), by 
the proof of Theorem 1 in [54] , we have 

n 

E{T°(W )} = ^tr{Q i)i , n A(M i , ri )} + 0(r 1 

i=l 

where 

n / oo \ 

r 3.« = E £ + £ P(^>)7| J -i|, 2 (J)<C'n6 ri ( ? i 2 6 n )- 1 eo, 2 (L) 2 . 



i=l \j=—oo j=n+ly 



Since ^?=i tr{Q M , n A(L, t i>n )} = (?i6 n )- 1 K*(0)A Wo ,i + o(n~V /2 ), (A.7) 
follows. □ 

Proof of Theorem 3.2. Let £> n = [0,6 n ] U [1 — b n , 1]. Lemma A.l im- 
plies sup teBn |Rn,o(*)| = OpHnbn)' 1 / 2 } and sup teBn |U n (t) - £{U„(f)}| = 
Op{(n6 n ,) -1 / 2 }. By the proof of Theorem 1 in [56], we have sup teBn \f3(t) — 
0(t)\ = O p {{nb n )- l l 2 + b 2 n }. Hence, 

f {A/3(t) - a} T W(t){A/3(i) - a} dt = O^ra" 1 + 

By (A. 3) and Lemma A. 3, we have sup bn<t<1 „ 6n | A/3(t) — a| = O p (ip n ) and 
(A.8) sup |A/3(t) — a — AM(G,t) _1 R nj o(t)| = O p (ip n p n ). 

b n <t<l—b n 
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Let W (t) = M(G,^ 1 A T W(t)AM(G,«)- 1 . Since sup teBn |R n . (t)| = 
O p {(n6 n ,)- 1/2 }, 

f R n ,o(t) T W (t)R„,o(t) dt = Opin- 1 ). 

Under our bandwidth conditions, nbH 2 <p\p n = o(l). For (3.3), by Lemma A. 5, 
it suffices to show that both n, n and r2, n are of order o(n~ l b n 1 ^ 2 ). By 
the proof of Lemma A. 4, rib n r\, n = 0{bn^ 1+u ^} = o(b}/ 2 ) since v > 1. Let 
r n = (nb Tl ) l /( 2+u \ Then 

f;[{fc/(n& re )}Al] 7fcl 2(L) < r " ( 2 r " + 1) e , 2 (L) 2 + e rn , 2 (L)e , 2 (L) = O(r-). 
k=o n 

Hence, we have nb n r2 jn = 0{{nb n )~ u ^ 2+u ^} = o(bl/ 2 ) since u > 1, (3.3) fol- 
lows. Note that 

T n (A,a, W) - T n (A,a, W) =I n - 2// n , 

where I n = f Q (a — a) T W(t)(a — a)dt = O p (n~ 1 ), and by (A. 8) and Lem- 
mas A.l and A. 3, 

//.„= / (a-a) T W(£){A/3(t) -a}dt 
Jo 

= (a - a) T j jf W(t)AM(G, *)~ lR »M>(t) * + O p (<^ n p n + & n ^ n ) j 
= O p {(n- 1 / 2 (n~ 1 /2 + ^ /0n)} . 
Note that (nb n ) l l 2 tp n p n = o(l), and Theorem 3.2 follows. □ 

Proof of Proposition 3.1. Under the local alternative (3.5), we have 
A(3"(t) = dJ"(t) and 

T n (A, a, W) - 7^ (W ) = d 2 n [ f(t) T W(t)i(t) dt + I n + 2II n , 

Jo 

where by (A. 3) and Lemmas A.l and A. 3, 

I n = f {A0(t) - Af3(t)} T W(t){A~p(t) - A/3(t)} dt - I^(Wo) 
Jo 

= O p {((p n + d n b 2 n )(<PnPn + d n b 2 n ) + (tfnPn + dntifytfn}, 

the weight matrix W (t) = M(G,f)~ 1 A T W(t)AM(G,i)" 1 and 

II n = d n f f(t) T W(t){A^(t) - A(3(t)} dt = O p {d n {ip nPn + n" 1 / 2 + d n b 2 n )}. 
Jo 

Since nbl/ 2 tpnPn = o(l), (3.6) follows from Lemma A. 5. □ 
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Recall that D C D* is a subset with complement D, and Dq is the true set 
of relevant predictors. Let &d,% = y% — X J iP(i/n), l<i<n. Then RSS(L>) = 

Yli=i &b v Lemma A. 6 provides bounds for RSS(D) — Yli=i e f f° r both cases 
D CD and D % D. 

Lemma A. 6. Assume (A1)-(A4), e , 4 (L) < oo, 6 n , t (L) = 0(n~ v ) for 
some v > 1/2 — l/i, b n — > and nb n — > oo. Then (i) if Dq C D, then 

n 

RSS(L>) =y^ef + O p {mp n ((p n + p n )}; 

i=l 

and (ii) i/ Do 2 ^ £/ien 

n n 

RSS(D) = J>? + ^/3 5 (V«) T i?{x A! xT { }/3 B (Vn) 

i=\ i=l 

+ O p {n 1/2 + ntp n ((p n + p n )}. 

Proof. For (i), since Dq C D, we have eo,i = e{ + xj i{fir){i/n) — 
(3 D (i/n)} and 

n n 

RSS(D) = = E e ? + J " - 2// «' 

i=l i=l 

where, by the proof of Theorem 3.1, I n = X^=i[ x i) APd^/ 71 ) ~ / n )Y\ 2 = 
Op(mp n ) and, by (A. 3) and Lemmas A.l and A. 3, 

n 

II n = Y,(^) T AlA D 0(i/n) - (3(i/n)} 
i=l 
n 

= ^2( Xiei ) T AlA D M(G, i/n)- 1 R„ i0 (i/n) 
i=l 

+ O p (nb n ip n + n<£ n/ o n + n 1/2 b n ). 
Since, by Lemma A. 2, 

^ n n 

— ^ ^(x iei ) T AjA D M(G, ^-'(x.e^ir 
n n i=i j=i 

= O p (b~ 1 / 2 + b~ 1 ), 

we have II n = O p (b~ l + nip n p n + n 1 / 2 ^). Since b~ l + n l / 2 b n = o{mp n ((p n + 
p n )}, (i) follows. For (ii), since e D ,i = + xJ i {/3 D (i/n) - (3 D (i/n)} + 



j/n — i/n 
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X E i@D (V n ) > we have 

n 

RSS(D) + £ + 2//° + III° n , 

i=i 

where, by (i), 

n n 

J n = + x D,i{0I>(*AO - /3l>(V n )}] 2 - JZ e i = °p{ n( Pn(<Pn + Pn)} 

i=l t=l 

and, by Lemma A.l and the argument on the quantity // n in (i), 

n 

II° n = J> + *5,i{jM*» " /M^llxT^/n) 

i=l 

= O^n 1 / 2 + ft" 1 + n<p n p n + n 1 / 2 ^ 2 ). 
In addition, by Lemma A.l, 

n n 

IlF n = Y^H^Dii/n)} 2 =^/3 5 ( l /r l ) T £{x A! xIjfe( l /n) + O^n 1 ' 2 ), 
i=i i=i 

Lemma A. 6 follows. □ 

Proof of Theorem 3.3. By Lemma A.l, E"=i( e2 ~ Ee l ) = P (" 1/2 )- 
Lemma A. 6 implies 



log{RSS(D)} = log ( ^e 2 J + O p {<p n (<p n + p n )} 
for Dq C D, and 



a=l 



log{RSS(D)}=log 



X> ? +E^(V^) T ^{x Ai xT J/3 B (i/n) 



8=1 1=1 



+ o p (l) 



for £>o ^ D. Since Xn = o(l) and <p n (<Pn + Pn) = °(Xn), Theorem 3.3 follows. 

□ 

Proof of Proposition 4.1. By Lemma A.l, 

sup \\T&(G,t)-E{Tfc(G,t)}\\=0{(nw n )-V 2 }, 
ie[o,i] 

and, by Lemma A. 2, 

sup ||A(L,t)-E{A(L,t)}||=0(^/ 2 ). 

t€[0,l] 
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By (A. 4), we have 

max \E{\i(L,T n Q n )} - A(L,t in )\ 

l<i<n 

oo 

^ C ^2i~fk(L) A (T n g n )} + 9 

k=0 

= 0{{r nQn yi^ + {nT nen r u }- 

Proposition 4.1 follows by properties of local linear estimates. □ 

Proof of Proposition 4.2. Consider the process {z t< i}i£z that satis- 
fies the recursion 

z tti = A(t)z t)i _ 1 +n (t;F i ), i€Z. 

Then, for each t £ [0,1], the process {zt,i}iez is stationary, and there exists 
a measurable function G such that z^j = G(t; i S Z. By condition (T3), 
Pa = su Pte[o,i] TK-A-(i)} < 1- Hence, by condition (T2) and induction, we have 

max ||x» - z i/n i \\ < pi max ||x;_ fc - z i/n ;_ fe || 

(A.9) 

fc-i . j 

+ k>2. 
^ n 

Since pa < 1 and JPa < °°' (4-8) follows by letting — > oo. It suffices 

to show that G G Lip. For this, by a similar argument of (A.9), we have for 
any k>2, 

k-i 

sup ||z tl|i - z t2i j|| < p A sup ||z tl)i _ fe - z i2ji _ fc || + C\ti - t 2 \ T^Pa- 
ti,*2 6[o,i] ti,*2e[o,i] J=0 

Since pa < 1 and X^iPa < 00 ' Proposition 4.2 follows by letting k — > oo. 
□ 
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